智能论文笔记

ClassPruning: Speed Up Image Restoration Networks by Dynamic N:M Pruning

Yang Zhou , Yuda Song , Hui Qian , Xin Du

分类：计算机视觉

2022-11-10

Image restoration tasks have achieved tremendous performance improvements with the rapid advancement of deep neural networks. However, most prevalent deep learning models perform inference statically, ignoring that different images have varying restoration difficulties and lightly degraded images can be well restored by slimmer subnetworks. To this end, we propose a new solution pipeline dubbed ClassPruning that utilizes networks with different capabilities to process images with varying restoration difficulties. In particular, we use a lightweight classifier to identify the image restoration difficulty, and then the sparse subnetworks with different capabilities can be sampled based on predicted difficulty by performing dynamic N:M fine-grained structured pruning on base restoration networks. We further propose a novel training strategy along with two additional loss terms to stabilize training and improve performance. Experiments demonstrate that ClassPruning can help existing methods save approximately 40% FLOPs while maintaining performance.

translated by 谷歌翻译

Modular Degradation Simulation and Restoration for Under-Display Camera

Yang Zhou , Yuda Song , Xin Du

分类：计算机视觉

2022-09-23

放映摄像头（UDC）为全屏智能手机提供了优雅的解决方案。但是，由于传感器位于显示屏下，UDC捕获的图像遭受了严重的降解。尽管可以通过图像恢复网络解决此问题，但这些网络需要大规模的图像对进行培训。为此，我们提出了一个模块化网络，称为MPGNET，该网络使用生成对抗网络（GAN）框架来模拟UDC成像。具体而言，我们注意到UDC成像降解过程包含亮度衰减，模糊和噪声损坏。因此，我们将每个降解与特征相关的模块化网络建模，并将所有模块化网络级联成型以形成生成器。加上像素的歧视器和受监督的损失，我们可以训练发电机以模拟UDC成像降解过程。此外，我们提出了一个用于UDC图像恢复的Dwformer的变压器式网络。出于实际目的，我们使用深度卷积而不是多头自我注意力来汇总本地空间信息。此外，我们提出了一个新型的渠道注意模块来汇总全局信息，这对于亮度恢复至关重要。我们对UDC基准进行了评估，我们的方法在P-Oled轨道上超过了先前的最新模型和T-Oled轨道上的0.71 dB。

translated by 谷歌翻译

Rethinking Performance Gains in Image Dehazing Networks

Yuda Song , Yang Zhou , Hui Qian , Xin Du

分类：计算机视觉

2022-09-23

Dimage Dehazing是低级视觉中的一个活跃主题，并且随着深度学习的快速发展，已经提出了许多图像去悬式网络。尽管这些网络的管道效果很好，但改善图像飞行性能的关键机制尚不清楚。因此，我们不针对带有精美模块的飞行网络。相反，我们对流行的U-NET进行了最小的修改，以获得紧凑的飞行网络。具体而言，我们将U-NET中的卷积块与门控机构，使用选择性内核进行融合，并跳过连接，并调用所得的U-NET变体Gunet。结果，由于开销大大减少，Gunet优于多个图像脱掩的数据集上的最新方法。最后，我们通过广泛的消融研究来验证这些关键设计为图像去除网络的性能增益。

translated by 谷歌翻译

Multi-Curve Translator for High-Resolution Photorealistic Image Translation

Yuda Song , Hui Qian , Xin Du

分类：计算机视觉

2022-03-15

主要的图像到图像翻译方法基于完全卷积的网络，该网络提取和翻译图像的特征，然后重建图像。但是，在使用高分辨率图像时，它们的计算成本不可接受。为此，我们介绍了多曲线翻译器（MCT），它不仅可以预测相应的输入像素的翻译像素，还可以预测其相邻像素的翻译像素。而且，如果将高分辨率图像删除到其低分辨率版本中，则丢失的像素是其余像素的相邻像素。因此，MCT可以使网络仅馈入倒数采样的图像以执行全分辨率图像的映射，从而大大降低计算成本。此外，MCT是一种使用现有基本型号的插件方法，仅需要更换其输出层。实验表明，MCT变体可以实时处理4K图像，并比各种逼真的图像到图像翻译任务上的基本模型实现可比甚至更好的性能。

translated by 谷歌翻译

Prediction of Gender from Longitudinal MRI data via Deep Learning on Adolescent Data Reveals Unique Patterns Associated with Brain Structure and Change over a Two-year Period

Yuda Bi , Anees Abrol , Zening Fu , Jiayu Chen , Jingyu Liu , Vince Calhoun

分类：计算机视觉 | 机器学习

2022-09-15

用于预测神经影像数据的深度学习算法在各种应用中显示出巨大的希望。先前的工作表明，利用数据的3D结构的深度学习模型可以在几个学习任务上胜过标准机器学习。但是，该领域的大多数先前研究都集中在成年人的神经影像学数据上。在一项大型纵向发展研究的青少年大脑和认知发展（ABCD）数据集中，我们检查了结构性MRI数据，以预测性别并确定与性别相关的大脑结构变化。结果表明，性别预测准确性异常高（> 97％），训练时期> 200，并且这种准确性随着年龄的增长而增加。大脑区域被确定为研究的任务中最歧视性的，包括主要的额叶区域和颞叶。当评估年龄增加两年的性别预测变化时，揭示了一组更广泛的视觉，扣带和孤立区域。我们的发现表明，即使在较小的年龄范围内，也显示出与性别相关的结构变化模式。这表明，通过查看这些变化与不同的行为和环境因素如何相关，可以研究青春期大脑如何变化。

translated by 谷歌翻译

Deadlock Resolution and Feasibility Guarantee in MPC-based Multi-robot Trajectory Generation

Yuda Chen , Meng Guo , Zhongkui Li

分类：机器人

2022-02-12

共享工作空间中无线轨迹的生成对于大多数多机器人应用程序至关重要。但是，许多基于模型预测控制（MPC）的广泛使用的方法缺乏基础优化的可行性的理论保证。此外，当以分布式的方式应用无中央协调员时，僵局通常会无限期地互相阻挡。尽管存在诸如引入随机扰动之类的启发式方法，但没有进行深入的分析来验证这些措施。为此，我们提出了一种系统的方法，称为Infinite-Horizon模型预测性控制，并通过死锁解决。 MPC用警告范围对拟议的修改后的Voronoi进行了配方，作为凸优化。基于此公式，对僵局的状况进行了正式分析，并证明与力平衡相似。提出了一个检测分辨率方案，该方案可以在甚至在发生之前有效地在网上检测到僵局，并且一旦检测到，便利用自适应分辨率方案来解决僵局，并在绩效上进行理论保证。此外，所提出的计划算法可确保在输入和模型约束下每个时间步骤的基础优化的递归可行性，对于所有机器人都是并发的，并且只需要本地通信。全面的模拟和实验研究是通过大规模多机器人系统进行的。与其他最先进的方法相比，尤其是在拥挤和高速场景中，成功率的显着提高了成功率。

translated by 谷歌翻译

MGTAB: A Multi-Relational Graph-Based Twitter Account Detection Benchmark

Shuhao Shi , Kai Qiao , Jian Chen , Shuai Yang , Jie Yang , Baojie Song , Linyuan Wang , Bin Yan

分类：计算机视觉

2023-01-03

The development of social media user stance detection and bot detection methods rely heavily on large-scale and high-quality benchmarks. However, in addition to low annotation quality, existing benchmarks generally have incomplete user relationships, suppressing graph-based account detection research. To address these issues, we propose a Multi-Relational Graph-Based Twitter Account Detection Benchmark (MGTAB), the first standardized graph-based benchmark for account detection. To our knowledge, MGTAB was built based on the largest original data in the field, with over 1.55 million users and 130 million tweets. MGTAB contains 10,199 expert-annotated users and 7 types of relationships, ensuring high-quality annotation and diversified relations. In MGTAB, we extracted the 20 user property features with the greatest information gain and user tweet features as the user features. In addition, we performed a thorough evaluation of MGTAB and other public datasets. Our experiments found that graph-based approaches are generally more effective than feature-based approaches and perform better when introducing multiple relations. By analyzing experiment results, we identify effective approaches for account detection and provide potential future research directions in this field. Our benchmark and standardized evaluation procedures are freely available at: https://github.com/GraphDetec/MGTAB.

translated by 谷歌翻译

EZInterviewer: To Improve Job Interview Performance with Mock Interview Generator

Mingzhe Li , Xiuying Chen , Weiheng Liao , Yang Song , Tao Zhang , Dongyan Zhao , Rui Yan

分类：自然语言处理

2023-01-03

Interview has been regarded as one of the most crucial step for recruitment. To fully prepare for the interview with the recruiters, job seekers usually practice with mock interviews between each other. However, such a mock interview with peers is generally far away from the real interview experience: the mock interviewers are not guaranteed to be professional and are not likely to behave like a real interviewer. Due to the rapid growth of online recruitment in recent years, recruiters tend to have online interviews, which makes it possible to collect real interview data from real interviewers. In this paper, we propose a novel application named EZInterviewer, which aims to learn from the online interview data and provides mock interview services to the job seekers. The task is challenging in two ways: (1) the interview data are now available but still of low-resource; (2) to generate meaningful and relevant interview dialogs requires thorough understanding of both resumes and job descriptions. To address the low-resource challenge, EZInterviewer is trained on a very small set of interview dialogs. The key idea is to reduce the number of parameters that rely on interview dialogs by disentangling the knowledge selector and dialog generator so that most parameters can be trained with ungrounded dialogs as well as the resume data that are not low-resource. Evaluation results on a real-world job interview dialog dataset indicate that we achieve promising results to generate mock interviews. With the help of EZInterviewer, we hope to make mock interview practice become easier for job seekers.

translated by 谷歌翻译

Deep Spectral Q-learning with Application to Mobile Health

Yuhe Gao , Chengchun Shi , Rui Song

分类： (统计)机器学习 | 机器学习

2023-01-03

Dynamic treatment regimes assign personalized treatments to patients sequentially over time based on their baseline information and time-varying covariates. In mobile health applications, these covariates are typically collected at different frequencies over a long time horizon. In this paper, we propose a deep spectral Q-learning algorithm, which integrates principal component analysis (PCA) with deep Q-learning to handle the mixed frequency data. In theory, we prove that the mean return under the estimated optimal policy converges to that under the optimal one and establish its rate of convergence. The usefulness of our proposal is further illustrated via simulations and an application to a diabetes dataset.

translated by 谷歌翻译

Rethinking the Video Sampling and Reasoning Strategies for Temporal Sentence Grounding

Jiahao Zhu , Daizong Liu , Pan Zhou , Xing Di , Yu Cheng , Song Yang , Wenzheng Xu , Zichuan Xu , Yao Wan , Lichao Sun

分类：计算机视觉

2023-01-02

Temporal sentence grounding (TSG) aims to identify the temporal boundary of a specific segment from an untrimmed video by a sentence query. All existing works first utilize a sparse sampling strategy to extract a fixed number of video frames and then conduct multi-modal interactions with query sentence for reasoning. However, we argue that these methods have overlooked two indispensable issues: 1) Boundary-bias: The annotated target segment generally refers to two specific frames as corresponding start and end timestamps. The video downsampling process may lose these two frames and take the adjacent irrelevant frames as new boundaries. 2) Reasoning-bias: Such incorrect new boundary frames also lead to the reasoning bias during frame-query interaction, reducing the generalization ability of model. To alleviate above limitations, in this paper, we propose a novel Siamese Sampling and Reasoning Network (SSRN) for TSG, which introduces a siamese sampling mechanism to generate additional contextual frames to enrich and refine the new boundaries. Specifically, a reasoning strategy is developed to learn the inter-relationship among these frames and generate soft labels on boundaries for more accurate frame-query reasoning. Such mechanism is also able to supplement the absent consecutive visual semantics to the sampled sparse frames for fine-grained activity understanding. Extensive experiments demonstrate the effectiveness of SSRN on three challenging datasets.

translated by 谷歌翻译